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Abstract 

Marsaglia recently introduced a class of "xorshift" random number 
generators (RNGs) with periods 2 n — 1 for n = 32, 64, etc. Here 
Marsaglia's xorshift generators are generalised to obtain fast and high- 
quality RNGs with extremely long periods. Whereas RNGs based 
on primitive trinomials may be unsatisfactory, because a trinomial 
has very small weight, these new generators can be chosen so that 
their minimal polynomials have a large number of non-zero terms and, 
hence, a large weight. A computer search using Magma has found 
good RNGs for n a power of two up to 4096. These RNGs have been 
implemented in a free software package xorgens. 
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1 Introduction 



Marsaglia [IT] proposed a class of uniform random number generators called 
"xorshift RNGs" . Their implementation requires only a small number of left 
shifts, right shifts and "exclusive or" operations per pseudo-random number. 

Assume that the computer wordlength is w bits (typically w = 32 or 64). 
Marsaglia's xorshift RNGs have period 2 n — 1, where n is a small multiple of 
w, say n = rw. 

The present author [3j showed that Marsaglia's xorshift RNGs are a spe- 
cial case of the well-known linear feedback shift register (LFSR) class of 
RNGs. This was also observed by Panneton and L'Ecuyer [T4]. However, 
the xorshift RNGs have implementation advantages because n (the number 
of state bits) is a multiple of the wordlength w. In contrast, for RNGs based 
on primitive trinomials, the corresponding parameter n can not be a multiple 
of eight (due to Swan's theorem [131 EE]) and is usually an odd prime. For 
example, n = 19937 in the case of the "Mersenne twister" [12]. The "tem- 
pering" step of the Mersenne twister can be omitted in the xorshift RNGs. 
Thus, the xorshift RNGs are simpler and potentially faster. 

Any RNG based on a finite state must eventually cycle, but it is desirable 
for RNGs to have a very long period T (the cycle length). Most generators fail 
certain statistical tests if more than about T 1//2 random numbers are used [H] , 
or perhaps even about T 1 / 3 numbers for the "birthday spacings" test [9]. 
For generators satisfying linear recurrences such as the LFSR generators 
with period 2 n — 1, there is a linear relationship between blocks of n + 1 
consecutive bits, so the generator may fail statistical tests that detect this 
linear relationship. Also, on a parallel machine we may want to use disjoint 
segments of the cycle on different processors, and if this is done by starting 
with different seeds on each processor, we want the probability that two 
segments overlap to be negligible. For all these reasons it is important for n 
to be large. The generators that we describe below have n as large as 4096 
which is enough, but not so large that the generators are slowed down by 
memory accesses. This is possible if the computer's memory cache is smaller 
than n bits. 

Marsaglia's original proposal [TT] discusses mainly the case n < 64, but 
an extension to larger n is suggested, The present author has implemented 
a generalisation xorgens (4] with n < 4096, in particular we can choose any 
power of two n = 2 k for 6 < k < 12. The problem in going to larger n is 
that we need to know the complete prime factorisation of 2 n — 1 in order 
to be sure that the generator's period is maximal. These factorisations are 
known for all multiples of 32 up to 1632 and for certain larger n, see [IB] . If 
we restrict n to powers of two then it is sufficient to know the factorisations 
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of certain Fermat numbers = 2 2 +1, since for example 

2 4096 _ 1 = (2 2048 + 1)(2 2048 _ ^ = ^20*8 _ y = . . . = p^p^ . . . Fl P Q . 

The factorisations of the Fermat numbers F , . . . , F u are known [2]. 

Panneton and L'Ecuyer [T4] tested Marsaglia's xorshift RNGs and found 
certain deficiencies, but they did not find any significant problems with our 
xorgens generators for n > 128. 

In Section [2] we introduce some notation, summarise the relevant theory, 
and describe the class of RNGs implemented in our xorgens package. In 
Section [3] we discuss criteria for the selection of "optimal" generators in the 
class, and give specific examples of optimal generators for various n < 4096 
and w = 32, 64. Finally, in Section |4] we discuss a known weakness of xorshift 
RNGs and mention some possible improvements. 

2 Notation and theory 

Let F 2 = GF(2) be the finite field with two elements {0, 1}. We usually write 
addition in F 2 as +, but we use © if it is necessary to distinguish it from 
normal integer addition. If is interpreted as "false" and 1 as "true", then 
the field operations are "exclusive or" (xor or ©) and "and" (A). A computer 
word of w bits can be regarded as a vector x of length w over F 2 . We shall 
identify a bit-vector x with the corresponding integer (and vice-versa) when 
necessary. 

Our RNGs generate pseudo-random bit-vectors x, but these easily give 
pseudo-random unsigned integers x G [0, 2 W ), pseudo-random signed integers 
x — 2 w ~ l G [— 2 W ~ 1 , 2 W ~ 1 ), or (by a linear transformation) pseudo-random 
real numbers in (0, 1). 

Unfortunately there are two conventions for bit- vectors x: Marsaglia [11] 
uses row vectors x G F 2 lxw , but Panneton and L'Ecuyer [T4] use column 
vectors x G F^ xl - We shall follow Marsaglia and take x as a row vector. (To 
convert to column vector notation, transpose all equations involving vectors 
and matrices.) 

Fix parameters r > s > (the choice of these will be discussed below), 
and consider the linear recurrence 

x (k) = x {k-r) A + x (k-s) B ; ^) 

where G F 2 lxw . Here A and B are fixed matrices in F 2 XW . Given x^°\ 
. . ., x^" 1 ), the recurrence (pQ) uniquely defines the sequence (x^)k>o- 



3 



Let L e F™ xw be the left shift matrix 

/ ••• \ 
1 ■■■ 



such that 



(xi, . . . , x w )L — (X2, • • • , x w , 0) . 



Similarly, let R = L T be the right shift matrix such that 

(xi, . . . , x w )R = (0, xi, . . . , x w -i) . 

Marsaglia's idea is to take A and B as products of a small number of 
terms such as (I + L a ) and (I + i? 6 ). Specifically, let us take 



A= (I + L a )(I + R b ) 



and 



B = (I + L c )(I + R d ) 

for small positive integer parameters a,b,c,d. Marsaglia [TTJ §3.1] omits the 
factor I + L c ; we include it for reasons of symmetry, to increase the number 
of possible choices (see §3]), and to improve properties related to Hamming 
weight (see §1]). 

With our choice of A and B, the recurrence ([TJ) becomes 



x (k) = x (k-r)(j + L a^j + ^ + x (k-s)(j + + _ 



(2) 



Note that, if x is a bit-vector of length w, then xL a is just x shifted left 
a places (xL a = if a > w), and x(J + L a ) is the xor of x and xL a . The 
operation 

x <- x(J + L a ) 



can be written in C as 
or more succinctly as 



x << a 



x << a 



(here x is represented in a computer word x which C treats as an unsigned 
integer). Similarly x <— x(I + R b ) can be written in C as x A = x » b, and 

x <h- x(I + L a )(I + R b ) 
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can be written in C as 

x ~= x << a ; x '= x >> b . 

The recurrence ([]]) is best implemented using a "circular array", that is 
an array where the indices are computed mod r (see [7J §3.2.2]), unless r is 
very small. 

It is well-known that we can write the recurrence ([T]) as 

, x (k-r+l)\ x {k-r+2)\ . . . | x (fch = f x {k-r)\ x (k-r+l)\ . . . I^Cfc— ^ (3) 

where the companion matrix C G _F" xn can be regarded as an r x r matrix 
oi w x w blocks (recall that n = rw). For example, if r = 3 and s — 1, then 

/ A 

(x( fc ~ V fc ~ V fc) ) = (^ (fc ~V fc ~ V fe_1 M i o o 

\ / B 

The period of the recurrence (J3j) is 2™ — 1 if the characteristic polynomial 

P{z) = det [C - zl) 

is primitive over F 2 . P{z) is primitive if it is irreducible and the powers 
z, z 2 , z 3 , . . . , z 2 " -1 are distinct mod P{z). To verify this, without checking 
2 n — 1 cases, it is sufficient to show that P{z) is irreducible and 

z (2 n -l)/ P _t l mod p( z ) 

for each prime divisor p of 2 n — 1: see Lidl [S] or Menezes [121 §4.5]. 
Suppose that 

n 

P(z) = J2c^. 

j=0 

From the Cayley-Hamilton theorem, P(C) = 0, so 

rt 

3=0 

It follows from <^ that 

( T U) I r (j+l) I . . . UO'+r-lh - f T (0)l T (l)| . . . \ T (r-l)\f>j 

so 

j2 CjX (^) = . 

3=0 

This shows that the pseudo-random sequence x^ k ' satisfies a linear recurrence 
over F 2 . For a good random number generator it is important that the weight 
W(P(z)) of the polynomial P(z), i.e. the number of nonzero coefficients Cj, 
is not too small [31E1CEI]. 
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3 Optimal generators 



Suppose the wordlength w and a parameter r > 2 are given, so n = rw is 
defined. We want to choose positive parameters (s, a, b, c, d) such that s < r 
and the RNG obtained from the recurrence (jSJ) has full period 2 n — 1. Of 
the many possible choices of (s, a, b, c, d), which is best? We give a rationale 
for making the "best" choice (or at least a reasonably good one, since often 
many choices are about equally good). 

1. Each bit in x(I + L a )(I + R b ) should depend on at least two bits in x, 
that is each column of the matrix (J + L a )(I + R b ) should have weight 
(number of nonzeros) at least two. A necessary condition for this is 
that a + b < w. Similarly, we require that c + d < w. 

2. Repeated applications of the transformation x x(I + L a )(I + R b ) 
should mix all the bits of the initial x (that is, after a large number 
of iterations each output bit should depend on each of the input bits). 
A necessary condition for this is that GCD(a,6) = 1. Similarly, we 
require that GCD (c,d) = 1. 

3. If (s,a,b,c,d) is one set of parameters, then (s,b,a,d,c) is associated 
with the same characteristic polynomial. We can assume that a > b, as 
otherwise we could interchange a -H- b, c -H- d to obtain an equivalent 
RNG. 

4. So that the left shift parameters (a and c) are not both greater than 
the right shift parameters (b and d) we also assume that c < d. 

5. In order that the bits in x(I + L a )(I + R b ) depend on bits as far away 
as possible (to both left and right) in x, we want to maximise min(a, b). 
Similarly, we want to maximise min(c, d). Thus, we try to maximise 
5 = min(a, b, c, d). 

6. As already discussed, once (a, b, c, d) are fixed, we want to choose s < r 
so that the generator has full period 2 n — 1. 

7. Finally, in case of a tie (two or more sets of parameters satisfying the 
above conditions with the same value of 5), we choose the set whose 
characteristic polynomial has maximum weight W. 

There might still be a tie, that is two sets of parameters satisfying the 
above conditions, with the same 5 and W values. However, because the 
weights W are quite large (see Tables HH2]) , this is unlikely and has not been 
observed. 
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Table 1: 32-bit generators. 



n 


r 


s 


a 


b 


c 


d 


5 


W 


64 


2 


1 


17 


14 


12 


19 


12 


31 


128 


4 


3 


15 


14 


12 


17 


12 


55 


256 


8 


3 


18 


13 


14 


15 


13 


109 


512 


16 


1 


17 


15 


13 


14 


13 


185 


1024 


32 


15 


19 


11 


13 


16 


11 


225 


2048 


64 


59 


19 


12 


14 


15 


12 


213 


4096 


128 


95 


17 


12 


13 


15 


12 


251 



Criteria [T] and lead to a simple search strategy. From criterion [T] we 
see that 5 < to/2, but criterion [S] is to maximise 5. Thus, we start from 5 = 
[w/2\ and decrease 5 by 1 until we find a quadruple of parameters (a, b, c, d) 
satisfying criteriadHH This involves checking 0((w/2— £) 4 ) possibilities since 
(a, b, c, d) G [5, w — <5] 4 . We then search for s satisfying criterion |6] (this is the 
most time-consuming step). There are r — 1 possibilities to check for each 
quadruple (a,b,c,d). If no s satisfying criterion [6] is found, we decrement 5 
and repeat the process. Once one satisfactory quintuple (s, a, b, c, d) has been 
found, we need only check other quintuples (V, a', b', c', d') with the same 6, 
and choose the best according to criteria [6] and UJ We need only consider 
s such that GCD(r, s) = 1, since this is a necessary (but not sufficient) 
condition for the characteristic polynomial to be irreducible. 

There might not be a solution satisfying all the conditions HH7J The 
number of candidates (s, a, b, c, d) is or order rw 4 , that is nw 3 since n = rw. 
The probability that a randomly chosen polynomial of degree n over F<i is 
primitive is between l/(nlogn) and 1/n, apart from constant factors [8| [T3] . 
Thus, if our characteristic polynomials behave like random polynomials of 
the same degree, we expect at least of order w 3 /logn solutions. For w > 32 
we have always been able to find a solution with w/2 — 5 < 9. Ifwis small, 
there may be no solution, for example there is no solution for w = 8, r = 6. 

The parameters for "optimal" random number generators with n a power 
of two (up to n = 4096) are given in Tables dH2J Parameters when n is not a 
power of two are available from the author's web site [3]. The computations 
were performed using Magma [lj. 

We do not recommend the RNGs with n < 128 since they may fail the 
matrix-rank test in the Crush testing package [6j HI] . However, no problems 
have been observed while testing the RNGs with n > 256. 
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Table 2: 64-bit generators. 



n 


r 


s 


a 


b 


c 


d 


5 


W 


128 


2 


1 


33 


31 


28 


29 


28 


65 


256 


4 


3 


37 


27 


29 


33 


27 


127 


512 


8 


1 


37 


26 


29 


34 


26 


231 


1024 


16 


7 


34 


29 


25 


31 


25 


439 


2048 


32 


1 


35 


27 


26 


37 


26 


745 


4096 


64 


53 


33 


26 


27 


29 


26 


961 



4 Problems and improvements 

The xorgens class of RNGs are easy to implement since only simple op- 
erations (left and right shifts and xors) on full words are required. Unlike 
RNGs based on primitive trinomials, their characteristic polynomials have 
high weight (see column T 1 of Tables HH2]) . Provided n > 256, they appear 
to pass all common empirical tests for randomness [HI QUI EE3] • 

However, the xorgens class, like Marsaglia's xorshift class, does have an 
obvious theoretical weakness. For x G -F 2 lxiu , define \\x\ \ to be the Hamming 
weight of x, that is the number of nonzero components of x. Then ||x — y\ | is 
the usual Hamming distance between vectors x and y. For random vectors 
x G -F 2 1X1U ) I Ml nas a binomial distribution with mean w/2 and variance w/4. 

Because the matrices (/ + L a ) and (/ + R b ) are sparse, they map vectors 
with low Hamming weight into vectors with low Hamming weight, in fact 
\\x(I + L a )\\ < 2||x||, \\x(I + R b )\\ < 2||x||, and consequently 

\\x(I + L a )(I + R b )\\ < 4||x|| . 

It follows that a sequence (x^) generated using the recurrence (T5]) satisfies 

||x (fe) || < A(\\x {k - r) \\ + \\x {k - s) \\) . 

Thus, the occurrence of a vector x^ with low Hamming weight is correlated 
with the occurrence of low Hamming weights further back in the sequence 
(with lags r and s). A statistical test could be devised to detect this behaviour 
in a sufficiently large sample. It is a more serious problem for the 32-bit 
generators than for the 64-bit generators, since the probability that a w-bit 
vector x has Hamming weight \\x\ \ < w/8 is 1.0 x 10~ 5 for w = 32, but only 
2.8 x 10~ 10 for w = 64. 

One solution, recommended by Panneton and L'Ecuyer [13], is to include 
more left and right shifts in the recurrence (T5]). This slows the RNG down, 



8 



but not by much, since most of the time is taken by loads, stores, and other 
overheads. Another solution, which we prefer, is to combine the output of 
the xorshift generator with the output of a generator in a different class, for 
example a Weyl generator which has the simple form 

w (k) = w {k-i) + u mod 2 w _ 

Here "+" means integer addition (mod 2 W ) and u is some odd constant (a 
good choice is an odd integer close to 2" , ~ 1 (v / 5 — 1)). The generators in our 
xorgens package return 

w mod z 

instead of simply x^ k \ Here 7 w/2 is a constant. This is better than 
returning + x^ k ' mod 2 W (as was done in an earlier version of xorgens) 
because the least significant bit of w^ k ' has period 2, but all bits of w^ k \l®E?) 
have a longer period (about 2 W I 2 \ and this period is relatively prime to the 
period 2™ — 1 of x^ k \ Thus each bit in the output should have high linear 
complexity [13J. 

Note that addition mod 2 W is not a linear operation on vectors over F 2 , so 
we are mixing operations in two algebraic structures. This is generally a good 
idea because it avoids regularities associated with linearity. For example, 
suppose we use one of Marsaglia's xorshift generators to initialise our state 
vector, and we do it three times with seeds s, s', s" satisfying s = s' © s"; 
then by linearity over F 2 our three sequences x,x',x" satisfy x = x' © x", 
which is clearly undesirable. This problem vanishes if the xorshift RNG used 
for initialisation is modified by addition (mod 2 W ) of a Weyl generator, as is 
done in the xorgens package. 

5 Conclusions 

We have shown how Marsaglia's xorshift RNGs can be generalised to give 
high-quality RNGs with extremely long periods (greater than 10 1232 ). The 
RNGs are fast and easy to implement because only word-aligned operations 
are used and no "tempering" step [12] is required. We discussed a potential 
problem related to correlation of outputs with low Hamming weights, and 
showed that this can be overcome by the simple expedient of combining the 
output of a generalised xorshift RNG with the output of a Weyl generator. An 
implementation of the resulting RNG is available in a free software package 
xorgens [3]. 
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